Now a days, the major challenge in machine learning is the `Big~Data'challenge. The big data problems due to large number of data points or largenumber of features in each data point, or both, the training of models havebecome very slow. The training time has two major components: Time to accessthe data and time to process the data. In this paper, we have proposed onepossible solution to handle the big data problems in machine learning. Thefocus is on reducing the training time through reducing data access time byproposing systematic sampling and cyclic/sequential sampling to selectmini-batches from the dataset. To prove the effectiveness of proposed samplingtechniques, we have used Empirical Risk Minimization, which is commonly usedmachine learning problem, for strongly convex and smooth case. The problem hasbeen solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), eachusing two step determination techniques, namely, constant step size andbacktracking line search method. Theoretical results prove the same convergencefor systematic sampling, cyclic sampling and the widely used random samplingtechnique, in expectation. Experimental results with bench marked datasetsprove the efficacy of the proposed sampling techniques.
展开▼